StrAl: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time

نویسندگان

  • Deniz Dalli
  • Andreas Wilm
  • Indra Mainz
  • Gerhard Steger
چکیده

MOTIVATION Alignment of RNA has a wide range of applications, for example in phylogeny inference, consensus structure prediction and homology searches. Yet aligning structural or non-coding RNAs (ncRNAs) correctly is notoriously difficult as these RNA sequences may evolve by compensatory mutations, which maintain base pairing but destroy sequence homology. Ideally, alignment programs would take RNA structure into account. The Sankoff algorithm for the simultaneous solution of RNA structure prediction and RNA sequence alignment was proposed 20 years ago but suffers from its exponential complexity. A number of programs implement lightweight versions of the Sankoff algorithm by restricting its application to a limited type of structure and/or only pairwise alignment. Thus, despite recent advances, the proper alignment of multiple structural RNA sequences remains a problem. RESULTS Here we present StrAl, a heuristic method for alignment of ncRNA that reduces sequence-structure alignment to a two-dimensional problem similar to standard multiple sequence alignment. The scoring function takes into account sequence similarity as well as up- and downstream pairing probability. To test the robustness of the algorithm and the performance of the program, we scored alignments produced by StrAl against a large set of published reference alignments. The quality of alignments predicted by StrAl is far better than that obtained by standard sequence alignment programs, especially when sequence homologies drop below approximately 65%; nevertheless StrAl's runtime is comparable to that of ClustalW.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RNA Structural Alignment with Conditional Random Fields

Computationally identifying non-coding RNA regions on the genome has much attention to be investigated. However, it is essentially harder than gene-finding problems for protein-coding regions because non-coding RNA sequences do not have a strong statistical signals. Since comparative sequence analysis is effective for non-coding RNA detection, efficient computational methods are expected for st...

متن کامل

Alignment of RNA base pairing probability matrices

MOTIVATION Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are...

متن کامل

Induction of apoptosis and necrosis in human acute erythroleukemia cells by inhibition of long non-coding RNA PVT1

Recent advances in molecular medicine have proposed new therapeutic strategies for cancer. One of the molecular research lines for the diagnosis and treatment of cancer is the use of long non-coding RNAs (LncRNAs) which are a class of non-coding RNA molecules longer than 200 base pairs in length that act as the key regulator of gene expression. Different aspects of cellular activities like cell...

متن کامل

RNA Base Pairing Probability Alignment by Genetic Algorithm

Sankoff algorithm is one of the most attractive ideas for predicting consensus RNA secondary structure [1]. In the algorithm, RNA sequence alignment problem and RNA folding problem are solved simultaneously by a dynamic programming. However, due to its high computational complexities in both time and space, Sankoff algorithm has not been used to solve practical problems in which usually RNA seq...

متن کامل

Alignment of RNA with Structures of Unlimited Complexity

Sequence-structure alignment of RNA with arbitrary secondary structure is Max-SNP-hard. Therefore, the problem of RNA alignment is commonly restricted to nested structure, where dynamic programming yields efficient solutions. However, nested structure cannot model pseudoknots or even more complex structural dependencies. Nevertheless those dependencies are essential and conserved features of ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 22 13  شماره 

صفحات  -

تاریخ انتشار 2006